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Abstract —With the explosive growth of web-based cameras 
and mobile devices, billions of photographs are uploaded to 
the internet. We can trivially collect a huge number of photo 
streams for various goals, such as 3D scene reconstruction 
and other big data applications. However, this is not an 
easy task due to the fact the retrieved photos are neither 
aligned nor calibrated. Furthermore, with the occlusion of 
unexpected foreground objects like people, vehicles, it is even 
more challenging to find feature correspondences and recon¬ 
struct realistic scenes. In this paper, we propose a structure- 
based image completion algorithm for object removal that 
produces visually plausible content with consistent structure 
and scene texture. We use an edge matching technique to 
infer the potential structure of the unknown region. Driven 
by the estimated structure, texture synthesis is performed 
automatically along the estimated curves. We evaluate the 
proposed method on different types of images: from highly 
structured indoor environment to the natural scenes. Our 
experimental results demonstrate satisfactory performance 
that can be potentially used for subsequent big data processing: 
3D scene reconstruction and location recognition. 

Keywords: Image Completion, Texture Synthesis, On¬ 
line Photos, Scene Reconstruction, Object Removal 

I. Introduction 

In the past few years, the massive collections of imagery 
on the Internet have inspired a wave of work on many 
interesting big data topics: scene reconstruction, location 
recognition, and online sharing of personal photo streams 
Q n ii- Eor example, one can easily download a huge 
number of photo streams associated with a particular place. 
By using features (e.g. SILT), it is possible to automatically 
estimate correspondence information and reconstruct 3D 
geometry for the scene O ||6l. Imagine building a world¬ 
scale location recognition engine from all of the geotagged 


images from online photo collections, such as Elickr and 
street view databases from Google and Microsoft. However, 
it is a challenging task as the photo streams are neither 
aligned nor calibrated since they are taken in different 
temporal, spatial, and personal perspectives. Eurthermore, 
with the occlusion of unexpected foreground objects, it is 
even more difficult to recover the whole scene or accurately 
identify overlapping regions between different photos. 

To resolve the above issue, image in-painting is an 
effective solution. In this paper, we propose an automatic 
object removal algorithm for scene completion, which ben¬ 
efits subsequent large imagery processing. The core of our 
method is based on the structure and texture consistency. 
Our proposed approach has two major contributions. Eirst, 
we develop a curve estimation approach to infer the potential 
structure of the occluded region on the image. Second, an 
orientated patch matching algorithm is designed for texture 
propagation. Our work has a broad range of applications 
including image localization Q (81, privacy protection (91 
Ga im, and other network based applications ca m 

ca Ea na. 

II. Related Works 

In the literature, image completion or in-painting has been 
intensively studied: in Ha, Efros and Leung used a one- 
pass greedy algorithm to render unknown pixels based on 
the assumption that the probability distribution of the pixel’s 
brightness is independent to the rest of the image when the 
spatial neighborhood is given. In (T8l, the authors proposed 
an example-based approach to fill in the missing regions. 
It worked well in filling in small gaps but not in large 
ones. The weakness of such approach is that it fails to 
preserve the potential structures. Jia et al. Ga designed 
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Fig. 1. Scene recovery by removing specified foreground object (a) Original Image (b) Our result (c) Contour detection by using OWT-UCM method 
(T] (d) Edge extraction (e) Structure generation in the occlusion region by identifying corresponding edge pairs (f) Some denotations in our algorithm. 


an image in-painting method based on texture-segmentation 
and tensor-voting that created smooth linking structures in 
the occluded regions. This method sometimes introduces 
noticeable artifact due to the texture inconsistency. Criminisi 
et al. ll^ made an improvement by assigning in-painting 
orders based on the edge strength levels. Their algorithm 
used a confidence map and the image edges to determine 
the patch completion priority. However, the structures in the 
resulting images are not well preserved. The method in 12111 
produced a better result via structure propagation, while this 
approach requires more interaction. The completion results 
largely depend on the animator’s individual technique. Some 
other existing work also explored in ll22l ll^ ll24l . 

III. Our Approach 

The process of our framework is: for a given image, users 
specify the object for removal by drawing a closed contour 
around it. The enclosure is considered an unknown region 
that is inferred and replaced by the remaining region of the 
image. Figure pXa)| shows an example: the red car is selected 
as the removing object. In the resulting image Figure |l(b)[ 
the occluded region is automatically recovered based on the 
surrounding environment. 

First let us define a set of notations for the rest of our 
paper. For an image I, the target region for in-painting is 
denoted as O; the remaining part of the image is denoted 
as ([)(= I — n), which is also known as source region. The 
boundary contour along jQ is denoted as 90. A pixel’s value 
is represented by p = I(x,y), where x and y are the 2D 


coordinates on the image. The surrounding neighborhood 
centered at (x,p) is often called as a patch, denoted as 
The coordinates of pixels inside the patch should be in 
the range: [x ± Ax,p ± Ay]. These concepts are illustrated 
in Figure |l(f)| In our framework, there are three phases 
involved to achieve the scene recovery:structure estimation, 
structure propagation, and remaining part filling. 

A. Structure Estimation 

In this phase, we estimate the potential structure in O 
by finding all the possible edges. This procedure can be 
further decomposed into two steps: Contour Detection in O 
and Curve Generation in Cl. 

1) Contour Detection in O.* We first segment the region 
O by using gPh Contour Detector 1^ . It is based on the 
idea of computing the oriented gradient signal G(x,p, 0) on 
the four channels of its transformed image: brightness, color 
a, color b and texture channel. G(x,p,0) is the gradient 
signal, where (x,p) indicates the center location of the 
circle mask that is drawn on the image and 0 indicates the 
orientation. The gPh Detector is composed of two important 
components: mPb Edge Detector and sPb Spectral Detector 
m- We apply linear combination on mPb and sPb (factored 
by (3 and y) according to the gradient ascent on F-measure: 

gPb(x,y,0] = |3 ■ mPb(x,y,0] +y ■ sPb(x,y, 0] (1) 

Thus a set of edges in O can be retrieved via gPb. 
However, these edges are not in close form and have 
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classification ambiguities. To solve this problem, we use the 
Oriented Watershed Transform ll^ and Ultrametric Con¬ 
tour Map m (OWT-UCM) algorithm to find the potential 
contours by segmenting the image into different regions. 
The output of OWT-UCM is a set of different contours {Ci} 
and their corresponding boundary strength levels [Ci] as 
Figure |l(c)| shows. 

2 ) Curve Generation in Cl: After obtaining the contours 
{Ci} from the above procedure, salient boundaries in O can 
be found by traversing {Ci}. Our method for generating the 
curves in Cl is based on the assumption: for the edges on 
the boundary in O that intersects with the 90, it either 
ends inside O or passes through the missing region O and 
exits at another point of 90. Below is our algorithm for 
identifying the curve segments in O: 


Algorithm III.l Identifying curve segments in O 
Require: Construct curve segments in Cl. 

Ensure: The generated curves have smooth transition between 
known edges. 

1: Initial t = 1.0 
2: Fort = t-At 
3: if 3e G {C} : E n 90 7 ^ 0 

4: Insert e into {E} 

5: End if t < 6t 

6: Set t = to, retrieve all the contours in {Ci} with Li > t 
7: Obtain < c|)xi, cfx 2 > for each Ex 

8: DP on {< c|)oi, cfo 2 >, < c|)n, (|)i 2 >,...} to find optimal pairs 
from the list. 

9: According to the optimal pairs, retrieve all the corresponding 
edge-pairs: {(Exi, Ex^), (Ex3, Ex4), ...)}• 

10: Compute a transition curve Cst for each (Es,Et). 


In algorithm IIILll it has three main parts: (a) collect all 
potential edges {E^} in O that hits 90; (b) identify optimal 
edge pairs {(Es>Et)} from {E^}; (c) construct a curve Cst 
for each edge pair (Es> Et). 

Edges Collection: The output of OWT-UCM are contours 
sets {Ci} and their corresponding boundary strength levels 
{Ci}. Given different thresholds t, one can remove those 
contours C with weak C. Motivated by this, we use the 
Region-Split scheme to gradually demerge the whole O 
into multiple sub-regions and extract those salient curves. 
This process is carried out on lines 1-9: at the beginning 
the whole region O is considered as one contour; then 
iteratively decrease t to let potential sub-contours {Ci} faint 
out according the boundary strength; Every time when any 
edges e from the newly emerged contours {C} were detected 
of intersecting with 90, they are put into the set {E}. 

Optimal Edge Pairs: the reason of identifying edge pairs 
is based on the assumption if an edge is broken up by 
O, there must exist a pair of corresponding contour edges 
in O that intersect with 90. To find the potential pairs 
{(Es,Et)} from the edge list {E^}, we measure the cor¬ 
responding enclosed regions similarities. The neighboring 


regions ^xi > ^x 2 > which is partitioned by the edge 
Es are used to compare with the corresponding regions of 
another edge Et. This procedure is described on lines 7 — 9 
of the algorithm IIILll Each neighboring region is obtained 
by lowing down the threshold value t to faint out more 
detailed contours as Figure |l(d)| shows. 

To compute the similarity between regions, we use the 
Jensen-Shannon divergence ll^ method that works on the 
color histograms: 


d(Hi,H2)=X{HElog-2:^ 

i=l ^2 


-E • log 


Hi -f H 


r) 


( 2 ) 

where Hi and H 2 are the histograms of the two regions 
(J)i, (}) 2 ; i indicates the index of histogram bin. For any two 
edge (Eg, Et), the similarity between them can be expressed 
as: 


M(Es,Et) = •min{d(Hsi,Hti) + d{Hsj,Htj)} 

(3) 

1 and j are the exclusive numbers in { 1 , 2 }, where 1 and 
2 represent the indices of the two neighboring regions in (f) 
around a particular edge. The Lraax is the max value of the 
two comparing edges’ strength levels. The first multiplier 
is a penalty term for big difference between the strength 
levels of the two edges. To find the optimal pairs among 
the edge list, dynamic programming is used to minimize 
the global distance: where s 7 ^ t and 

s,t G {0,1,size({Ei})}. To enhance the accuracy, a 
maximum constraint is used to limit the regions’ difference: 
d(Hi,H 2 ) < 6 h. If the individual distance is bigger than 
the pre-specified threshold 6 h, the corresponding region 
matching is not considered. In this way, it ensures if there 
are no similar edges existed, no matching pairs would be 
identified. 

Generate Curves for each (Es,Et) : we adopt the idea 
of fitting the clothoid segments with polyline stoke data first 
before generating a curve lIZTl . Initially, a series of discrete 
points along the two edges Eg and Et are selected, denoted 
as {psO)Psi) •••)Psn,PtO)Pti , ••MPtml- These points have a 
distance with each other by a pre-specified value A^. For 
any three adjacent points {pi_i ,pi,pi+i}, the correspond¬ 
ing curvature kt could be computed according to ||28l: 

^ 2 - det(pi -pi-i,pi+i -Pi) 

' llPi-Pi-1 II -llPi+i-Pill-llPi+i-Pi-1 II 

Combining the above curvature factors, a sequence of 
polyline are used to fit these points. The polylines are 
expected to have a possibly small number of line seg¬ 
ments while preserving the minimal distance against the 
original data. Dynamic programming is used to find the 
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most satisfied polyline sequence by giving a penalty for 
each additional line segment. A set of clothoid segments 
can be derived corresponding to each line segment. After 
a series rotations and translations over the clothoid, a final 
curve C is obtained by connecting each adjacent pair with 
continuity Ezl. Figure |l(e)| demonstrates the curve 
generation result. 

B. Structure Propagation: 

After the potential curves are generated in O, a set 
of texture patches, denoted as >•••}, need to be 

found from the remaining region O and placed along the 
estimated curves by overlapping with each other with a 
certain proportion. Similar to ED, an energy minimization 
based method is proposed in a Belief Propagation (BP) 
framework. However, we have different definitions for the 
energy and message passing functions. The details are in 
the algorithm IIII.2I 


Algorithm III.2 BP Propagation Algorithm 

Require: Render the texture for each patch in O along the 
estimated structures. 

Ensure: Find the best matching patches while ensuring the global 
coherence and consistency. 

1: For each curve C in O, define a series of anchor points on it, 
{at, |i = 1 -> n}. 

2 : Collect exemplar-texture patches in <!>, where tt € [1, m] 

3: Setup a factor graph Q = {V, based on {C} and {at} 

4: Defining the energy function E for each at: Ei(ti), where tt 
is the index in [1, M]. 

5: Defining the message function Mtj for each edge S in Q, with 
initial value Mtj 0 

6: Iteratively update all the messages Mtj passed between {at} 

7: Mtj mina|{Ei(ti) + Eij (ti, tj) + ^kGAr(i),k7^j 

8 : end until AMtj < 6 , Vi, j (by Convergence) 

9: Assign the best matching texture patch from {^t} for each at 
that argmin[T,R]{ZiGvE^(^^) + Z(i,j)G£ Here T 

is an TL dimensional vector [ti , ti,tn], where i G [l,n]; 
R is also an n dimensional vector [ri, ri,Tn] with each 
element representing the orientation of source patch . 


In the algorithm, the anchor points are evenly distributed 
along the curves with an equal distance from each other Ad. 
These points represent the center where the patches 
(I X 1) are synthesized, as shown in Figure |I(f)[ In practice, 
we define Ad = ^ • 1. The {^t} is the source texture patches 
in O. They are chosen on from the neighborhood around 
90. For the factor graph building, we consider each at as 
a vertex Vi and Eij — UiU j, where i, j are the two adjacent 
points. 

In previous works lITTIl ll 2 Ql , each Wi have the same 
orientation as which limits the varieties in the source 
texture. Noticing that different patch orientations could 
produce different results, we introduce a scheme called 


Adaptive Patch by defining a new formulas for E and M 
in the structure propagation. 

Traditionally, the node energy Ei(ti) is defined as the 
Sum of Square DifferenceiStStD) by comparing the known 
pixels in each patch Wi with the candidate corresponding 
portion in But this method limits the salient structure 
directions. Instead of using SSD on the two patches, a 
series of rotations are performed on the candidate patch 
before computing the similarity. Mathematically, Ei(ti) can 
be formulated as: 

Et(ti) = oc^.p.Y_ ll'l'i - R(0) • %\\i (5) 

Where R represents different rotations on the patch ^'tj. 
Since the size of a patch is usually small, the rotation 
can be specified with an arbitrary number of angles. In 
our experiment, it is specified as 0 G {0, =b^, tt}. The 

parameter A represents the number of known pixels in Wi 
that overlap with the rotated patch ^tf F is a penalty 
term, the more number of overlapping pixels, the higher 
of similarity is assigned. So we use P to discourage the 
patches with smaller number of sharing pixels. Here, the 
percentage is expressed as P = ^ (I is the length of W). 
OCX is the corresponding normalized scalar. Thus the best 
matching patch ^ is represented by two factors: index ti 
and rotation Ri. 

In a similar way, the energy Eij (ti, tj) on each edge fij 
can be expressed as: 

Here i and j are the indices of the two adjacent patches in 
n. A penalty scheme is applied to the similarity comparison. 
The two parameters for Wi indicate the index and rotation 
for the source patches in The messages propagation 

is derived from the results of the above energy functions. 
We adopt a similar method as ED, where the message /VLij 
passes by patches is defined as: 

Mij =Ei(ti)+Ei3(ti,t3) (7) 

Through iterative updating on the BP graph, an optimal 
decision of {ti} for the patches in {^t} is made by minimiz¬ 
ing the nodes’ energy. This principle can be formulated in 
the definition below: 

ti = argmin{Ei(ti) -1- ^ (8) 

k 

Where k is one of the neighbors of the patch ^t: k G 
A/'(i). ti is the optimal index for the matching patch. To 
achieve minimum global energy cost, dynamic programming 
is used. Each assignment for Wi or ai is considered as a 
stage. In each stage, the choices of represent different 
states. The edge £ij represents the transit cost from state 
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at stage i to state at stage j. Starting from i = 0, an 
optimal solution is achieved by minimizing the total energy 
from last step: 

= Ei(ti) + min{Exj (ti^ tj) + i (ti_i)} (9) 

where represents a set of different total energy 

values at current stage i. In the situation of multiple inter¬ 
sections among curves C, we adopted the idea in 1 ^ . where 
readers can refer for further details. 

C Remaining Part Filling: 

After the curves are generated in jQ, we fill the remaining 
regions by using the exemplar-based approach in 1 ^ . 
The 90 is getting smaller and smaller by spreading out 
the known pixels O in a certain order. To enhance the 
accuracy, all the pixels in the above generate patches along 
the estimated curves are assigned with a pre-computed 
confidence value based on the confidence updating rule in 

GOl. 

IV. Experiments 


e e A, » 





(a) (b) 


• • 

(c) (d) 

Fig. 2. Kanizsa Triangle Experiment (a) The original Image (b)Curve 
reconstruction for the missing region O (c) Result by Criminisi’s method 
(d) Our result. 

In our experiments, we first evaluate our proposed ap¬ 
proach in terms of structure coherence by comparing our 
result with the one in 1 ^ that works on the well-known 
Kanizsa triangle. As shown in Figure [2(^ the white triangle 
in the front is considered as the occluded region O that 
needs to be removed. First, a structure propagation is carried 
out based on the detected edges along 90. The dash lines in 
Figure [2^ indicate the estimated potential structures in O. 
Texture propagation is applied to the rest of the image based 


on the confidence and isophote terms. One can notice both 
the triangle and the circles are well completed in our result 
Figure |2(d)| comparing with Criminisi’s method in Figure 

To further demonstrate the performance, a set of images 
are used for scene recovery: ranging from indoor envi¬ 
ronment to natural scenes. Figure |3(e)| shows an indoor 
case where highly structured patterns often present, such 
as the furniture, windows, walls. The green bottle on the 
office partition is successfully removed while preserving the 
remaining structure. In this example, three pairs of edges are 
identified and connected by the corresponding curves that 
are generated in the occluded region O. Figure |3(g)| and 
|3(f)| show the results of removing trees in the nature scenes. 
Several curves are inferred by matching the broken edges 
along 90 and maximizing the continuity. We can notice 
the three layers of the scene (sky, background trees, and 
grass land) are well completed. In Figure |3(h)[ it shows a 
case that a perching bird is removed from the tree. Our 
structure estimation successfully completes the tree branch 
with smooth geometric and texture transitions. 

V. Conclusion 

In this paper, we present a novel approach for foreground 
objects removal while ensuring structure coherence and tex¬ 
ture consistency. The core of our approach is using structure 
as a guidance to complete the remaining scene. This work 
would benefit a wide range of applications especially for 
the online massive collections of imagery, such as photo 
localization and scene reconstructions. Moreover, this work 
is applied to privacy protection by removing people from 
the scene. 
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(e) (f) (g) (h) 

Fig. 3. Scene reconstruction results on different settings: the first row shows the original images; the second row shows the corresponding result images. 
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